Introduction to NumPy

NumPy arrays

The core object of NumPy is the array. NumPy arrays are an efficient way to store numerical data, provide operations to manipulate them, and can be combined together in lots of ways.

At their most simple, a NumPy array is similar to a Python list: it contains a collection of data, it has a length, it can be indexed using square brackets and it can be looped over. However they have a number of features that distinguish them from Python lists. We'll discover these as we go through the course today but some will feel like a restriction while others will feel more powerful.

The restrictions and advantages of using a NumPy array over a Python list mean that they are very suited to some situations, while not appropriate to others. Part of your job while programming is knowing what tool to use in which situation.

We'll start by importing the numpy module. This module is not a core part of Python, but is generally available or easily installed. If you are using the Anaconda Python distribution then it will come as a standard part of the base environment.

It is commonly aliased to np:

In [1]:
import numpy as np

There are lots of ways of creating arrays, but the simplest is to pass in an existing Python list to the np.array function:

In [2]:
my_list = [1, 2, 3, 4, 5]
my_array = np.array(my_list)

or you can pass it in directly:

In [3]:
my_array = np.array([1, 2, 3, 4, 5])

This gives us an object, my_array, which you can display:

In [4]:
my_array
Out[4]:
array([1, 2, 3, 4, 5])

Or, if you're working in a .py script, you can print them (note the slightly different output format):

In [5]:
print(my_array)
[1 2 3 4 5]

Selecting and editing data

You can access the items in an array in much the same way as in a list. i.e. to select the first element:

In [6]:
my_array[0]
Out[6]:
1

To select the last element:

In [7]:
my_array[-1]
Out[7]:
5

To select everything from index 2 to the end (using a slice):

In [8]:
my_array[2:]
Out[8]:
array([3, 4, 5])

NumPy arrays are mutable so you can edit the values within by indexing it with [] on the left-hand side of the = and passing a value:

In [9]:
my_array[4] = 999
print(my_array)
[  1   2   3   4 999]

You can see that the last number has changed from 5 to 999.

Practice

  1. Extract the first three elements (the answer should give array([1, 2, 3]))
  2. Set the values of my_array back to [1 2 3 4 5].

answer

Creating pre-filled arrays

It's common to want to create arrays of a cetain size with all the values set to something specific.

For example, to create a three-item array filled entirely with 0 NumPy provides a function called np.zeros:

In [10]:
np.zeros(3)
Out[10]:
array([0., 0., 0.])

Or a four-item array filled with 1 you can use np.ones:

In [11]:
np.ones(4)
Out[11]:
array([1., 1., 1., 1.])

You can also create ranges of numbers, much like the built-in Python function range by using the NumPy function np.arange which takes the same sorts of arguments (plus some optional extra):

In [12]:
np.arange(10)
Out[12]:
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

Or, if instead of wanting integers in a range, you want 10 evently-spaced numbers between 3 and 6, you can use np.linspace (linearly spaced):

In [13]:
np.linspace(3, 6, 5)
Out[13]:
array([3.  , 3.75, 4.5 , 5.25, 6.  ])

Practice

Using these functions, create one or more of:

  • an array which contains [0. , 0.2, 0.4, 0.6, 0.8, 1. ]
  • an array which contains [4, 5, 6, 7, 8, 9]
  • an array which contains [0]

answer

Restrictions of NumPy arrays

There are a number of restictions on how NumPy arrays can be used, each of which serves a purpose in making it more efficient. The first is that unlike Python lists, arrays cannot be resized. Whatever size it is at creation is fixed.

With a Python list you can use the append method:

In [14]:
my_list = [5, 7, 3]
my_list.append(703)
my_list
Out[14]:
[5, 7, 3, 703]

But NumPy arrays have no such method:

In [15]:
my_array.append(3)
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Input In [15], in <cell line: 1>()
----> 1 my_array.append(3)

AttributeError: 'numpy.ndarray' object has no attribute 'append'

The reason for this restriction is that arrays are stored very efficiently in memory and adding another number to the end of an array may require the entire array to be moved elsewhere in memory where there is space for that extra number, and copying data in memory is slow.

There is, however, an np.append function which works similarly, but importantly it returns an entirely new array containing the copied data so the original array is still unchanged.

Data types

Another important way in which NumPy arrays differ from Python lists is that each array can only hold one "type" of data.

The main reason for this is because it's how NumPy is able to perform calculations so quickly. If it knows in advance that all the items in an array are, for example, integers, then it can make some assumptions which make anything you do to it faster.

By default it will infer the type from the data you pass in, so in our case because we passed in a list of integers, the data type (or dtype) of the array is:

In [16]:
my_array.dtype
Out[16]:
dtype('int64')

This is a 64-bit integer. On your computer you might get int32 instead.

When you create the array you can specify the dtype:

In [17]:
my_int_array = np.array([1, 2, 3], dtype=int)
my_int_array.dtype
Out[17]:
dtype('int64')
In [18]:
my_float_array = np.array([1, 2, 3], dtype=float)
my_float_array.dtype
Out[18]:
dtype('float64')

Note that even though you passed in integer values when creating my_float_array, when you print the array, it shows them with a decimal point:

In [19]:
print(my_float_array)
[1. 2. 3.]

Also note that if you have an array with an integer dtype, then if you try to set a value as a float, it will drop the decimal places without warning:

In [20]:
my_int_array[1] = 47.769
print(my_int_array)
[ 1 47  3]